Mark HTTP/1.1 async connection as not open (non-reusable) as soon as it becomes closed by the opposite endpoint #543

ok2c · 2025-08-07T16:17:38Z

@rschmitt This should reduce the window of the stale connection lease race condition somewhat.

rschmitt · 2025-08-07T19:26:16Z

httpcore5/src/main/java/org/apache/hc/core5/http/impl/nio/AbstractHttp1StreamDuplexer.java

@@ -277,13 +278,14 @@ public final void onInput(final ByteBuffer src) throws HttpException, IOExceptio
            return;
        }

-        boolean endOfStream = false;
        if (incomingMessage == null) {
            final int bytesRead = inbuf.fill(ioSession);


I think there's a key issue right here: we can't read bytes _and _ endOfStream in the same operation. So even if the closure of the connection is already known to the host (at some level of abstraction), this code doesn't learn about it until the next event loop iteration, by which point the connection has already been returned to the pool and potentially reused.

I guess one question here is: should we read in a loop until bytesRead comes back as either 0 or -1? Kind of an edge-triggered approach, where you drain the file descriptor until read() returns would-block (EAGAIN). If possible, I think that approach, combined with this change set, would greatly reduce or eliminate purely internal race conditions in the client's connection management.

I guess one question here is: should we read in a loop until bytesRead comes back as either 0 or -1?

@rschmitt The protocol handler should be reading incoming data greedily until there is no more data to be read. I will review the code tomorrow to make sure it is really so.

@rschmitt You are right. The protocol handler reads message body content greedily but stops at the end of the message and waits until the next input event. I am looking into it

@rschmitt Please test the updated change-set in your test environment and let me know if it improves the situation.

I ran my tests against your change set. Here are my findings:

Uncontended connection reuse is moderately more reliable with HTTP, but not with HTTPS.

Contended connection reuse is the same, with failure rates of 67% for small batches and 50% for large batches.

I've pushed my reproducer here so you can experiment with it. One thing I quickly discovered is that your change set is ineffective on HTTPS because SSLIOSession::read is just returning the same cached value every time:

@Override public int read(final ByteBuffer dst) { return endOfStream ? -1 : 0; }

This is altogether different from HTTP, where I've confirmed that the duplexer reads endOfStream on the second call to fillBuffer().

To me, the biggest mystery is why contended connection reuse is failing at such a consistently high rate:

Core version: 5.4-alpha1-SNAPSHOT Client version: 5.6-alpha1-SNAPSHOT Http: Sequential requests (rapid): 2,496 succeeded; 4 failed (99.84% success rate) Http: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate) Http: Single large batch: 15 succeeded; 15 failed (50.00% success rate) Http: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate) Https: Sequential requests (rapid): 2,235 succeeded; 265 failed (89.40% success rate) Https: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate) Https: Single large batch: 15 succeeded; 15 failed (50.00% success rate) Https: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate)

One fascinating detail is that your change set caused the HTTP batch tests to see RequestNotExecutedException, where previously they saw ConnectionClosedException. Before:

Http: Sequential requests (rapid): 2,479 succeeded; 21 failed (99.16% success rate) 14 not executed, 6 closed, 0 reset, 1 other Http: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate) 0 not executed, 0 closed, 0 reset, 0 other Http: Single large batch: 15 succeeded; 15 failed (50.00% success rate) 0 not executed, 15 closed, 0 reset, 0 other Http: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate) 0 not executed, 5 closed, 0 reset, 0 other

After:

Http: Sequential requests (rapid): 2,491 succeeded; 9 failed (99.64% success rate) 9 not executed, 0 closed, 0 reset, 0 other Http: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate) 0 not executed, 0 closed, 0 reset, 0 other Http: Single large batch: 15 succeeded; 15 failed (50.00% success rate) 15 not executed, 0 closed, 0 reset, 0 other Http: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate) 5 not executed, 0 closed, 0 reset, 0 other

This must mean that the connection is being returned to the pool the very instant the HTTP response is read, before the second fillBuffer() call takes place (or it means that the new endOfStream check is ineffective).

This must mean that the connection is being returned to the pool the very instant the HTTP response is read

@rschmitt This is precisely what the protocol handler does. As soon as the response message has been fully consumed and the connection is deemed re-usable the protocol handler hands it back to the connection pool which may immediately lease it out to another caller. No amount of trickery can make the protocol handler aware of the opposite endpoint's intent to not honor the protocol requirements with regards of the connection persistence, especially when using TLS to negotiate the connection closure. It can get lucky and read the end of stream over a plain connection but this is just luck.

There is no solution to this problem other then a request recovery and re-trial mechanism and possibly some sort of a stale connection check. I suggest we focus on that.

Your changes in this PR significantly improved things by causing RequestNotExecutedException (a safely retryable exception) to be thrown much more consistently in this case. I was hoping we could figure out how to do the same thing in the case of TLS.

I was hoping we could figure out how to do the same thing in the case of TLS.

@rschmitt I do not think this can be done. Not without a major rewrite of the TLS layer. Presently the TLS layer and the HTTP protocol layers are decoupled and the protocol layer can interact with the lower transport layer through a generic IOSession interface. I do not see a way of making it reliably run through the TLS close notify handshake from the protocol handler.

I really think we ought to focus on #547 or something similar (such as using OPTIONS as a poor man's version of H2 ping)

OK, fair enough. In that case, I'm indifferent on whether to ship the present PR. Plaintext HTTP is almost irrelevant today.

I do have some unfinished business involving a bug in UDS connection closure with CloseMode.GRACEFUL, and I think it might be related to the things we're talking about in this thread. I'll try to get back to that this week.

…it becomes closed by the opposite endpoint

ok2c requested a review from rschmitt August 7, 2025 16:17

rschmitt reviewed Aug 7, 2025

View reviewed changes

ok2c force-pushed the mark_half_closed_async_http1_as_not_open branch from df89f06 to 24ccaad Compare August 8, 2025 09:05

Mark HTTP/1.1 async connection as not open (non-reusable) as soon as …

dfa2cd5

…it becomes closed by the opposite endpoint

ok2c force-pushed the mark_half_closed_async_http1_as_not_open branch from 24ccaad to dfa2cd5 Compare August 8, 2025 10:40

ok2c closed this Aug 9, 2025

rschmitt mentioned this pull request Aug 11, 2025

Stale check command for async connections #547

Open

ok2c deleted the mark_half_closed_async_http1_as_not_open branch August 15, 2025 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mark HTTP/1.1 async connection as not open (non-reusable) as soon as it becomes closed by the opposite endpoint #543

Mark HTTP/1.1 async connection as not open (non-reusable) as soon as it becomes closed by the opposite endpoint #543

Uh oh!

ok2c commented Aug 7, 2025

Uh oh!

rschmitt Aug 7, 2025

Uh oh!

ok2c Aug 7, 2025

Uh oh!

ok2c Aug 8, 2025

Uh oh!

ok2c Aug 8, 2025

Uh oh!

rschmitt Aug 8, 2025

Uh oh!

ok2c Aug 9, 2025

Uh oh!

rschmitt Aug 10, 2025

Uh oh!

ok2c Aug 11, 2025

Uh oh!

rschmitt Aug 11, 2025

Uh oh!

Uh oh!

Mark HTTP/1.1 async connection as not open (non-reusable) as soon as it becomes closed by the opposite endpoint #543

Mark HTTP/1.1 async connection as not open (non-reusable) as soon as it becomes closed by the opposite endpoint #543

Uh oh!

Conversation

ok2c commented Aug 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!